$RCSfile: C2Oberon.txt $ Description: Hints for converting C code into Oberon Created by: fjc (Frank Copeland) $Revision: 1.2 $ $Author: fjc $ $Date: 1994/05/13 19:26:48 $ Copyright © 1994, Frank Copeland. This file is part of Oberon-A. See Oberon-A.doc for conditions of use and distribution. ________________________________________________________________________ [This document is only partly complete (there was more, but DME ate it). :-( It is here because what there is might still be useful. It will be finished for a future release. FJC] Introduction ------------ This document contains advice for translating source code written in C into Oberon. It covers both simple syntax conversion and broader, program organisation issues. C and Oberon both belong to the same tradition of imperative, procedural languages that trace their origins back to Algol in the late 1950's. They are both general purpose languages and are sufficiently low-level that they can be used as system-programming languages (C is more conciously directed to this use). Where they mainly differ is in the areas of modularity and type safety; Oberon is stronger in both these areas. In most cases the syntaxes are similar enough to allow a straightforward translation from one language to the other. A text editor with a macro facility can help simplify this task. Difficulties arise when encountering constructs common in C which are missing in Oberon, such as union types and unsigned integers. Other, more subtle difficulties may arise as a result of C's less strict (often non-existant) type checking. Finally, C's approach to modular programming is completely different to Oberon's, and may require a complete restructuring of the source code. Simple Data Types ----------------- Simple data types are the basic building blocks of all data structures. As they are determined by the underlying architecture common to almost all modern computers, C and Oberon have a very similer range of basic types. However, while the ANSI C standard specifies minimum sizes and ranges for all of these types, Oberon leaves much more up to the implementor. The equivalences given are those for Amiga C compilers and the Oberon-A compiler. The following table helps to explain the equivalences: ANSI C Min. Size Oberon-A ------ --------- -------- char 1 byte When used to hold ASCII values, the equivalent is CHAR; when used as a small integer, the equivalent is SHORTINT. unsigned char 1 byte When used to hold ASCII values, the equivalent is CHAR; when used as a small integer, the equivalent is BYTE. For BYTE see below. short 2 bytes INTEGER. int 2 bytes INTEGER or LONGINT. Some C compilers have 16-bit ints, others have 32-bit ints. For 16-bit ints, use INTEGER, otherwise use LONGINT. unsigned int 2 bytes No equivalent. See below. long 4 bytes LONGINT. unsigned long 4 bytes No equivalent. See below. float 6 digits REAL. double 10 digits LONGREAL. long double 10 digits LONGREAL. The BYTE type is strictly limited in Oberon (it is removed in Oberon-2). It may be used to represent an unsigned byte-sized value (0-255), but the only operation allowed is assignment. To perform arithmetic on BYTE values, you must first assign them to INTEGER variables. Oberon does not allow for unsigned integers apart from the limited BYTE type. In many cases this is no problem since a signed integer can be used instead without problems, although it may be necessary to use a larger type (INTEGER instead of SHORTINT, for instance). The main problem occurs when you must specify a type with the same size as the unsigned type; this often occurs when declaring Oberon equivalents for the Amiga system data structures. You cannot use a larger type in this case; the system WILL crash if you do. Often the operating system defines unsigned constants to be placed in these variables with values that are larger than the maximum for a signed integer of that size. These must be converted to negative values using this formula: - ( + 1). For example: An unsigned int constant 0xFFFE or 65534 converts to: (65534 - (65535 + 1)) = -2. Often unsigned integers are used as bit fields, which are directly equivalent to the Oberon SET type. An unsigned char bit field is equivalent to the SYSTEM.BYTESET type; an unsigned int is equivalent to the SYSTEM.WORDSET type; an unsigned long corresponds to the SET type. If the exact size of the variable doesn't matter, use the SET type; the other two types are unique to the Oberon-A implementation and are not portable. C has no direct equivalent to Oberon's BOOLEAN type, but ints are often used for the same purpose. When they are, a value of zero equates to FALSE and any other value equates to TRUE. Oberon has no equivalent to C's enumerated data types, but they can be simulated very easily using integer constants. For instance: enum days = { Sun,Mon,Tues,Wed,Thurs,Fri,Sat }; becomes: CONST Sun = 0; Mon = 1; Tues = 2; Wed = 3; Thurs = 4; Fri = 5; Sat = 6; Constant values --------------- Character literals in C are defined using single quotes: 'a', 'Z', etc. String literals use double quotes: "This is a string". Oberon uses double quotes for both. This can create an unexpected difficulty. Oberon has no way of distinguishing a string literal with a single character in it from a character literal except from the context in which it is used. If the compiler does not contain special code to deal with the anomaly, perfectly legal code may generate errors. (Oberon-A now handles this anomaly). String and character literals in C may contain escape sequences such as '\n' and "\x5c". Oberon does not recognise escape sequences (but Oberon-A does, as an extension; see OC.doc.) Character literals in Oberon may be expressed as hexadecimal ASCII codes instead: '\n' becomes 0AX. In C, any numeric literal starting with "0x" or "0X" is a hexadecimal value. Upper and lower case characters can be used for the hexadecimal digits. Oberon uses an "H" character at the end of the literal to indicate a hexadecimal number and hex digits must be in upper case. A number must start with a numeric character, the same as in C. Example: "0xa5d3" becomes "0A5D3H". Any integer constant in C starting with "0" (except for hex constants, obviously) is interpreted as an octal number. Oberon has no equivalent for this and the constant must be converted into its decimal or hexadecimal equivalent. A "U" at the end of an integer constant indicates an unsigned number. Oberon has no equivalent for this (see above). An "L" at the end of an integer constant indicates that it has a long int type. This is unnecessary in Oberon, which handles all the necessary type conversions automagically. Floating point constants are generally defined the same way in C and Oberon, except that an exponent must be indicated with an uppercase "E" in Oberon. C constants usually are of type double; Oberon defaults to type REAL. To specify a LONGREAL constant, Oberon uses a "D" in the exponent instead of an "E". C uses an "L" at the end of a floating point constant to indicate it is a long double type; remove this when converting to Oberon. Bit field constants in C are usually defined in one of two ways. One is to use the form: "(1<> i3 i1 := ASH (i2, -i3), or i1 := SYSTEM.LSH (i2, -i3) greater than a > b a > b greater or equal a >= b a >= b less than a < b a < b less or equal a <= b a <= b equal a == b a = b not equal a != b a # b bitwise AND i1 = i2 & i3 i1 := SYSTEM.AND (i2, i3) s1 = s2 & s3 s1 := s2 * s3 s1 = s2 & ~s3 s1 := s2 - s3 bitwise OR i1 = i2 | i3 i1 := SYSTEM.LOR (i2, i3) s1 = s2 | s3 s1 := s2 + s3 bitwise XOR i1 = i2 ^ i3 i1 := SYSTEM.XOR (i2, i3) s1 = s2 ^ s3 s1 := s2 / s3 logical AND b1 && b2 b1 & b2 logical OR b1 || b2 b1 OR b2 assignment a = b a := b In C, assignment (=) is an operator that may be used in an expression. In Oberon, assignment (:=) is a statement. This means that in C you can say: if (c = getchar()) ... while in Oberon you must use: c := getchar (); IF c # 0X THEN ... You may have to pay close attention to the C increment and decrement operators. If the operator is placed before the variable, the operation is performed before the rest of the expression; if after, the expression is performed first. This determines where the Oberon INC or DEC procedure should be placed. Oberon and C both use short-circuit evaluation when processing boolean expressions involving logical AND and logical OR operators. This means that the whole expression may not be evaluated if its result can be determined before the end. For example, in the expression (A & B), if A evaluates to FALSE B is never evaluated because it is unnecessary; the expression result will be FALSE regardless of the value of B. When translating such expressions, pay close attention to the operator precedence rules to make sure that the Oberon expression has the same logic as the C expression. In C you can replace any expression of the form: A = A B where is a binary operator, with: A = B. In Oberon, you must convert such an expression back into its first form. For example: (C) A *= B => A := A * B (Oberon). Block Statement --------------- C has the concept of a block statement, which is a sequence of statements enclosed in braces that may be used anywhere in place of a single statement. It takes the form: { ; ; ... ; } The closest equivalent in Oberon is the main body of a procedure or module, which is a sequence of statements bracketed by BEGIN and END: BEGIN ; ; ... END C also makes use of block statements in if, for and while statements. Oberon has its own syntax for these statements; see below. C block statements can also include variable declarations whose scope is limited to the block statement. In Oberon, such a block must be redefined as a local procedure and given a name which is used in place of the block statement. See Subroutines below. Note also that in Oberon semicolons are statement _seperators_ while in C they are _terminators_. In C, a statement must end with a semicolon; in Oberon, a semicolon is only necessary if another statement follows it. Conditional Statements ---------------------- C and Oberon both have the if/then/else and case conditional statements. The different syntaxes for if/then/else are: ANSI C Oberon if () IF THEN ; {; } [else [ELSE ;] {; }] END In C, does not have to be a boolean expression, it only needs to evaluate to a zero or non-zero value. Zero is treated as FALSE, non-zero is TRUE. When translating to Oberon, the expression must be converted to a boolean expression. For example: if (v) ... (C) => IF v # 0 THEN ... (Oberon) In C, the may be a block statement, in which case the semicolon is unnecessary. In Oberon, the block statement is replaced by a sequence of statements, seperated by semicolons. In both languages the else part is optional. In Oberon the END is mandatory. In C you may see something like: if () ; else if () ; In Oberon, this would be expressed as: IF THEN ELSIF THEN END The equivalent case statements are: ANSI C Oberon switch () { CASE OF case : : | case : : | ... ... case : : [default : ] [ELSE ] } END In C, only one constant item is allowed per case. In Oberon each case may include a list of constants, including ranges of values. In C, ALL statements after the activated case are executed, unless a break statement is encountered. In Oberon, only the statements associated with the activated case are executed. To illustrate this, the following statements are equivalent: switch (today) { CASE today OF case Mon : Mon .. Fri : case Tue : StdIO.WriteStr ("go work!") case Wed : | case Thur: Sat, Sun : case Fri : IF today = Sat THEN puts ("go work!"); StdIO.WriteStr ("clean the "); break; StdIO.WriteStr ("yard and "); case Sat : END; printf StdIO.WriteStr ("relax!"); ( "%s", StdIO.WriteLn (); "clean the yard and "); END; case Sun : puts ("relax!"); } Note the use of the "|" character to seperate the cases. The default part in C and the ELSE part in Oberon are both optional and are executed only if none of the other cases are activated. If no default is provided and no case is activated, C simply continues after the case statement. Oberon will cause a run-time error if no cases are activated and there is no ELSE. To get the same behaviour as C, include an ELSE with an empty statement (ie - nothing) after it. Iteration --------- Subroutines ----------- Data Structures --------------- Program Structure ----------------- Standard Libraries ------------------ Other Issues ------------ Example programs ----------------